In [1]:
# Allow us to load `open_cp` without installing
import sys, os.path
sys.path.insert(0, os.path.abspath(".."))
Bowers, Johnson, Pease, "Prospective hot-spotting: The future of crime mapping?", Brit. J. Criminol. (2004) 44 641--658. doi:10.1093/bjc/azh036
Johnson et al., "Prospective crime mapping in operational context", Home Office Online Report 19/07 Police online library
Divide the area of interest into a grid. The grid is used for both the algorithm, and for data visualisation. There is some discussion about reasonable grid sizes. 10m by 10m or 50m by 50m have been used. Later studies in the literature seem to use a somewhat larger grid size, 100m or 150m square.
We select "bandwidths": space/time regions of the data. Common values are to look at events within 400m and the last 2 months (8 weeks). For each grid cell, for each event falling in this range, we compute a weighting for the event, and then sum all the weightings to produce a (un-normalised) "risk intensity" for that cell.
I believe the original paper (1) is unclear on this. The discussion on page 9 shows a formula involving "complete 1/2 grid widths" but does not give details as to how, exactly, such a distance is to be computed. The next paragraph gives a couple of examples which seem unclear, as it simply talks about "neighbouring cell". No formula is given for total weight, but we can infer it from the examples.
Let $t_i$ be the elapsed time between now and the event, and $d_i$ the distance of event $i$ from the centre of the grid cell we are interested in. Then $$ w = \sum_{t_i \leq t_\max, d_i \leq d_\max} \frac{1}{1+d_i} \frac{1}{1+t_i}. $$ For this to make sense, we introduce units:
Paper (2) uses a different formula, and gives no examples:
$$ w = \sum_{t_i \leq t_\max, d_i \leq d_\max} \Big( 1 + \frac{1}{d_i} \Big) \frac{1}{t_i}. $$where we told only that:
It is not clear to me that (2) gives the temporal and spatial bandwidths used.
Notice that both weight functions couple the "units" of time and space. For example, if we halve the cell width used, then (roughly speaking) each $d_i$ will double, while the $t_i$ remain unchanged. This results in the time component now having a larger influence on the weight.
Paper (2) introduces a variation of this method:
For the second set of results for the prospective method, for each cell, the crime that confers the most risk is identified and the cell is assigned the risk intensity value generated by that one point.
Again, this is not made entirely clear, but I believe it means that we look at the sum above, and instead of actually computing the sum, we compute each summand and then take the largest value to be the weight.
The "risk intensity" for each grid cell is computed, and then displayed graphically as relative risk. For example:
When using the risk intensity to make predictions, there are two reasonable choices:
The difference between (1) and (2) is that events may change their time-based weighting (or event fall out of the temporal bandwidth completely). For example, if today is the 20th March and an event occurred on the 14th, we consider it as occuring zero whole weeks ago, and so it contributes a weight of $1/1 = 1$ (in the 1st formula, for example). However, if we recompute the risk for the 22nd March, this event is now one whole week in the past, and so the weight becomes $1/2$.
This issue falls under what I term an "aliasing issue" which comes about as we are taking continuous data and making it discrete:
It would appear, a priori, that changing the offset of the grid (e.g. moving the whole grid 10m north) could cause a lot of events to jump from one grid cell to another.
We keep the grid for "prediction" purposes, but we allow a large range of "weights" to be plugged in, from various "guesses" as to what the exactly the original studies used, to variations of our own making.
Note, however, that this is still ultimately a "discrete" algorithm. We give a variation which generates a continuous kernel (and then bins the result for visualisation / comparision purposes) as a different prediction method, see below.
In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import open_cp
import open_cp.prohotspot as phs
In [3]:
# Generate some random data
import datetime
times = [datetime.datetime(2017,3,10) + datetime.timedelta(days=np.random.randint(0,10)) for _ in range(20)]
times.sort()
xc = np.random.random(size=20) * 500
yc = np.random.random(size=20) * 500
points = open_cp.TimedPoints.from_coords(times, xc, yc)
In [4]:
region = open_cp.RectangularRegion(0,500, 0,500)
predictor = phs.ProspectiveHotSpot(region)
predictor.data = points
# To be correct, we should also adjust the weight, as now we have changed
# the ratio between space and time.
predictor.grid = 20
prediction = predictor.predict(times[-1], times[-1])
In [5]:
def plot_events_by_week(ax):
for week in range(2):
start = times[-1] - datetime.timedelta(days = 7 * week)
end = times[-1] - datetime.timedelta(days = 7 * week + 7)
mask = ~( (end < points.timestamps) & (points.timestamps <= start) )
ax.scatter(np.ma.masked_where(mask, points.xcoords),
np.ma.masked_where(mask, points.ycoords),
marker="+", color="black", alpha = 1 - week * 0.5)
fig, ax = plt.subplots(figsize=(8,8))
ax.set(xlim=[region.xmin, region.xmax], ylim=[region.ymin, region.ymax])
ax.pcolormesh(*prediction.mesh_data(), prediction.intensity_matrix, cmap="Blues")
plot_events_by_week(ax)
It might seems strange that some events seem to confer so little risk that the cell they appear in does not reach the top 10%. This is because the weight is additive, and so we strongly weight cells which are near a number of different events. Remember also that we weight recent events more.
Furthermore, because we look at the percentiles, but above view the data using a linear colouring, we are not really comparing like with like. You can see this below, where we plot the (inverse of the) cumulative probability density and we see that the distribution has a heavy tail.
In [6]:
import matplotlib.colors
def make_risk_chart(prediction, ax=None):
bins = np.array([0.9, 0.95, 0.99])
binned = np.digitize(prediction.percentile_matrix(), bins)
masked = np.ma.masked_where(binned == 0, binned)
fixed_colour = matplotlib.colors.ListedColormap(["blue", "yellow", "red"])
if ax is None:
_, ax = plt.subplots(figsize=(8,8))
ax.set(xlim=[region.xmin, region.xmax], ylim=[region.ymin, region.ymax])
ax.pcolormesh(*prediction.mesh_data(), masked, cmap=fixed_colour)
ax.scatter(points.xcoords, points.ycoords, marker="+", color="black")
make_risk_chart(prediction)
In [7]:
data = prediction.intensity_matrix.ravel().copy()
data.sort()
index = len(data) // 100
print("Intensity ranges from {} to {}".format(
np.min(prediction.intensity_matrix), np.max(prediction.intensity_matrix)))
print("1%, 5% and 10% points are {}, {}, {}".format(
data[len(data) - 1 - len(data) // 100],
data[len(data) - 1 - 5 * len(data) // 100],
data[len(data) - 1 - 10 * len(data) // 100] ))
In [8]:
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(np.arange(len(data)) * 100 / (len(data)-1), data)
ax.set_xlabel("Percentage")
ax.set_ylabel("Risk Intensity")
None
In [9]:
cts_predictor = phs.ProspectiveHotSpotContinuous()
cts_predictor.data = points
cts_predictor.grid = 20
cts_prediction = cts_predictor.predict(times[-1], times[-1])
In [10]:
image_size = 250
density = np.empty((image_size, image_size))
for i in range(image_size):
for j in range(image_size):
density[j][i] = cts_prediction.risk((i + 0.5) / image_size * 500, (j + 0.5) / image_size * 500)
fig, ax = plt.subplots(figsize=(10,10))
ax.imshow(density, cmap="Blues", extent=(0,500,0,500),origin="bottom", interpolation="bilinear")
plot_events_by_week(ax)
ax.set(xlim=[0, 500], ylim=[0, 500])
None
We note that, unlike the "retrospective" hotspotting technique, the kernel is does not decay smoothly to zero. This accounts for the "noisier" looking image.
In [11]:
grid = open_cp.predictors.GridPredictionArray.from_continuous_prediction_region(cts_prediction, region, 20, 20)
In [12]:
fig, ax = plt.subplots(ncols=2, figsize=(16,7))
ax[0].pcolormesh(*prediction.mesh_data(), prediction.intensity_matrix, cmap="Blues")
ax[1].pcolormesh(*grid.mesh_data(), grid.intensity_matrix, cmap="Blues")
for i, t in enumerate(["Grid based method", "Continuous kernel method"]):
ax[i].set(xlim=[region.xmin, region.xmax], ylim=[region.ymin, region.ymax])
plot_events_by_week(ax[i])
ax[i].set_title("From "+t)
In [13]:
fig, ax = plt.subplots(ncols=2, figsize=(16,7))
make_risk_chart(prediction, ax[0])
make_risk_chart(grid, ax[1])
for i, t in enumerate(["Grid based method", "Continuous kernel method"]):
ax[i].set_title("From "+t)
So here we see some quite noticible differences between the purely grid based method, and the kernel density estimation method (which is only gridded as a final step).
Notice that this algorithm makes no mention of the time of day when events occur. The paper (2) slightly addresses this, see pages 39--41. They do indeed find evidence that repeat victimisation is linked to a time of day effect. The analysis looks at the 3 shift patterns employed by the police in the study. It ends with the followig note:
Thus, as a final change to the system, the predictive model was developed so that it produced different maps for the different shifts, weighting more heavily those events that occurred during the same shift as that for which the prediction was made.
However, no details as to these "weighting"(s) is given. At present, we have taken no guess as to what these "weighting"(s) are.
In [ ]: